A Spectro-Temporal Framework for Compensation of Reverberation for Speech Recognition
نویسندگان
چکیده
The objective of this thesis is the development of signal processing and analysis techniques that would provide sharply improved speech recognition accuracy in highly reverberant environments. Speech is a natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the thesis is on the robustness of ASR to reverberation. In our work, we first provide a new framework to adequately and efficiently represent the problem of reverberation in speech feature domains. Although our framework incurs modeling approximation errors, we believe that it provides a good basis for developing reverberation compensation algorithms. Based on our framework, we successfully develop a number of dereverberation algorithms. The algorithms reduce the uncertainly involved in dereverberation tasks by using speech knowledge in terms of cepstral auto-correlation, cepstral distribution, and, non-negativity and sparsity of spectral values. We demonstrate the success of our algorithms on clean-training as well as matched-training. Apart from dereverberation, we also provide an approach for noise robustness via a temporal-difference operation in the speech spectral domain. There, via a theoretical analysis, we predict an expected improvement in the SNR threshold shift for whitenoise conditions. We also empirically quantify and study speech-feature level distortion with respect to speech-signal level additive noise. Finally, we provide a new framework for a joint reverberation and noise representation and compensation. The new framework generalizes the spectral domain reverberation framework by incorporating an additive noise term. Working under the new framework, we combine our dereverberation and noise compensation approaches for better dereverberation as well as for the most challenging speech recognition task that includes both noise and reverberation components.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal processing for blind estimation of reverberation time and single-ended quality measurement of reverberant speech
Auditory spectro-temporal representations of reverberant speech are investigated for blind estimation of reverberation time (RT ) and for single-ended measurement of speech quality. The auditory representations are obtained from an eight-filter filterbank which is used to extract the modulation spectra from temporal envelopes of the speech signal. Gaussian mixture models (GMM), one for each mod...
متن کاملMulti-stream to many-stream: using spectro-temporal features for ASR
We report progress in the use of multi-stream spectro-temporal features for both small and large vocabulary automatic speech recognition tasks. Features are divided into multiple streams for parallel processing and dynamic utilization in this approach. For small vocabulary speech recognition experiments, the incorporation of up to 28 dynamically-weighted spectro-temporal feature streams along w...
متن کاملThe Ntu - Adsc Systems for Reverberation Challenge 2014
This paper describes our speech enhancement and recognition systems developed for the Reverberation Challenge 2014. To enhance the noisy and reverberant speech for human listening, besides using conventional methods such as delay and sum beamformer and late reverberation reduction by spectral subtraction, we also studied a novel learning-based speech enhancement. Specifically, we train deep neu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011